Indexing and Selection

Operation Syntax Result
Select column df[col] Series
Select row by label df.loc[label] Series
Select row by integer df.iloc[loc] Series
Select rows df[start:stop] DataFrame
Select rows with boolean mask df[mask] DataFrame

documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html


In [ ]:
import pandas as pd
import numpy as np

In [ ]:
produce_dict = {'veggies': ['potatoes', 'onions', 'peppers', 'carrots'],'fruits': ['apples', 'bananas', 'pineapple', 'berries']}
produce_df = pd.DataFrame(produce_dict)
produce_df
selection using dictionary-like string

In [ ]:

list of strings as index (note: double square brackets)

In [ ]:

select row using integer index

In [ ]:

select rows using integer slice

In [ ]:


In [ ]:

+ is over-loaded as concatenation operator

In [ ]:

Data alignment and arithmetic

Data alignment between DataFrame objects automatically align on both the columns and the index (row labels).

Note locations for 'NaN'


In [ ]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
sum_df = df + df2
sum_df

Boolean indexing


In [ ]:


In [ ]:

first select rows in column B whose values are less than zero

then, include information for all columns in that row in the resulting data set


In [ ]:


In [ ]:

isin function

In [ ]:

where function

In [ ]: